Goto

Collaborating Authors

 minimax distance


A Generic Framework for Clustering Vehicle Motion Trajectories

Hoseini, Fazeleh S., Rahrovani, Sadegh, Chehreghani, Morteza Haghir

arXiv.org Machine Learning

The development of autonomous vehicles requires having access to a large amount of data in the concerning driving scenarios. However, manual annotation of such driving scenarios is costly and subject to the errors in the rule-based trajectory labeling systems. To address this issue, we propose an effective non-parametric trajectory clustering framework consisting of five stages: (1) aligning trajectories and quantifying their pairwise temporal dissimilarities, (2) embedding the trajectory-based dissimilarities into a vector space, (3) extracting transitive relations, (4) embedding the transitive relations into a new vector space, and (5) clustering the trajectories with an optimal number of clusters. We investigate and evaluate the proposed framework on a challenging real-world dataset consisting of annotated trajectories. We observe that the proposed framework achieves promising results, despite the complexity caused by having trajectories of varying length. Furthermore, we extend the framework to validate the augmentation of the real dataset with synthetic data generated by a Generative Adversarial Network (GAN) where we examine whether the generated trajectories are consistent with the true underlying clusters.


Memory-Efficient Sampling for Minimax Distance Measures

Hoseini, Fazeleh Sadat, Chehreghani, Morteza Haghir

arXiv.org Machine Learning

Learning a proper representation is usually the first step in every machine learning and data analytic tasks. Some recent representation learning methods have been developed in the context of deep learning [1], which are highly parameterized and require a huge amount of labeled data for training. On the other hand, there are methods that learn a proper representation in an unsupervised way and usually do not require learning free parameters. A category of unsupervised representations and distance measures, called link-based distance [2, 3], take into account all the paths between the objects represented in a graph. These distance measures are often obtained by inverting the Laplacian of the base distance matrix in the context of Markov diffusion kernel [2].


Hierarchical Correlation Clustering and Tree Preserving Embedding

Chehreghani, Morteza Haghir

arXiv.org Machine Learning

We propose a hierarchical correlation clustering method that extends the well-known correlation clustering to produce hierarchical clusters. We then investigate embedding the respective hierarchy to be used for (tree preserving) embedding and feature extraction. We study the connection of such an embedding to single linkage embedding and minimax distances, and in particular study minimax distances for correlation clustering. Finally, we demonstrate the performance of our methods on several UCI and 20 newsgroup datasets.


Global Optimal Path-Based Clustering Algorithm

Liu, Qidong, Zhang, Ruisheng

arXiv.org Machine Learning

Abstract--Combinatorial optimization problems for clustering are known to be NPhard. Most optimization methods are not able to find the global optimum solution for all datasets. T o solve this problem, we propose a global optimal path-based clustering (GOPC) algorithm in this paper. The GOPC algorithm is based on two facts: (1) medoids have the minimum degree in their clusters; (2) the minimax distance between two objects in one cluster is smaller than the minimax distance between objects in different clusters. Extensive experiments are conducted on synthetic and real-world datasets to evaluate the performance of the GOPC algorithm. The results on synthetic datasets show that the GOPC algorithm can recognize all kinds of clusters regardless of their shapes, sizes, or densities. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the GOPC algorithm. In addition, the GOPC algorithm needs only one parameter, i.e., the number of clusters, which can be estimated by the decision graph. The advantages mentioned above make GOPC a good candidate as a general clustering algorithm. In clustering algorithms, measuring the dissimilarity between any pair of points is very important. The most commonly used dissimilarity method is Euclidean distance. However, in many real-world applications of pattern classification and data mining, we are often confronted with high-dimensional features of the investigated data, which adversely affects clustering performance due to the curse of dimensionality [9], [10]. It is widely acknowledged that many real-world datasets stringently obey low-rank rules, which means that they are distributed on a manifold of a dimensionality that is often much lower than that of ambient space [11], [12], [13].


Nonparametric feature extraction based on Minimax distance

Chehreghani, Morteza Haghir

arXiv.org Artificial Intelligence

We investigate the use of Minimax distances to extract in a nonparametric way the features that capture the unknown underlying patterns and structures in the data. We develop a general-purpose framework to employ Minimax distances with many machine learning methods that perform on numerical data. For this purpose, first, we compute the pairwise Minimax distances between the objects, using the equivalence of Minimax distances over a graph and over a minimum spanning tree constructed on that. Then, we perform an embedding of the pairwise Minimax distances into a new vector space, such that their squared Euclidean distances in the new space equal to the pairwise Minimax distances in the original space. In the following, we study the case of having multiple pairwise Minimax matrices, instead of a single one. Thereby, we propose an embedding via first summing up the centered matrices and then performing an eigenvalue decomposition. Finally, we perform several experimental studies to illustrate the effectiveness of our framework.


Nonparametric Feature Extraction from Dendrograms

Chehreghani, Morteza Haghir, Chehreghani, Mostafa Haghir

arXiv.org Machine Learning

We study nonparametric feature extraction from hierarchies. The commonly used Minimax distance measures correspond to building a dendrogram with single linkage criterion, with the definition of specific forms of a level function and a distance function over that. Therefore, we develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep learning models. Finally, we demonstrate the effectiveness of our approach via numerical studies.


Classification with Minimax Distance Measures

Chehreghani, Morteza Haghir (Xerox Research Centre Europe)

AAAI Conferences

Minimax distance measures provide an effective way to capture the unknown underlying patterns and classes of the data in a non-parametric way. We develop a general-purpose framework to employ Minimax distances with any classification method that performs on numerical data. For this purpose, we establish a two-step strategy. First, we compute the pairwise Minimax distances between the objects, using the equivalence of Minimax distances over a graph and over a minimum spanning tree constructed on that. Then, we perform an embedding of the pairwise Minimax distances into a new vector space, such that their squared Euclidean distances in the new space are equal to their Minimax distances in the original space. We also consider the cases where multiple pairwise Minimax matrices are given, instead of a single one. Thereby, we propose an embedding via first summing up the centered matrices and then performing an eigenvalue decomposition. We experimentally validate our framework on different synthetic and real-world datasets.


Walking on Minimax Paths for k-NN Search

Kim, Kye-Hyeon (Pohang University of Science and Technology) | Choi, Seungjin (Pohang University of Science and Technology)

AAAI Conferences

Link-based dissimilarity measures, such as shortest path or Euclidean commute time distance, base their distance on paths between nodes of a weighted graph. These measures are known to be better suited to data manifold with nonconvex-shaped clusters, compared to Euclidean distance, so that k -nearest neighbor (NN) search is improved in such metric spaces. In this paper we present a new link-based dissimilarity measure based on minimax paths between nodes. Two main benefits of minimax path-based dissimilarity measure are: (1) only a subset of paths is considered to make it scalable, while Euclidean commute time distance considers all possible paths; (2) it better captures nonconvex-shaped cluster structure, compared to shortest path distance. We define the total cost assigned to a path between nodes as L p norm of intermediate costs of edges involving the path, showing that minimax path emerges from our L p norm over paths framework. We also define minimax distance as the intermediate cost of the longest edge on the minimax path, then present a greedy algorithm to compute k smallest minimax distances between a query and N data points in O(log N + k log k) time. Numerical experiments demonstrate that our minimax k-NN algorithm reduce the search time by several orders of magnitude, compared to existing methods, while the quality of k -NN search is significantly improved over Euclidean distance.